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ABSTRACT 

This study modeled school and district effects in the 
mathematics scores of the Delaware Student Testing Program (DSTP) using 
hierarchical linear modeling. Three-level hierarchical models were fitted to 
estimate school and district effects in the DSTP mathematics scores and to 
examine the school and district variance with variables assessing student 
characteristics entered as predictors at each level. Data were collected in 4 
waves, of which 3 were available for analyses: (1) 8,061 third graders from 

1998 from 66 schools and 15 districts; (2) 8,066 fifth graders in 43 schools 
and 15 districts who took the test in 2000; and (3) matched data from 6,872 
of the same students who took the third grade test in 1998 and the fifth 
grade test in 2000. Hierarchical linear modeling showed that the proportions 
of the variance at the school and district levels in the total variance of 
the DSTP scores for 1998 and 2000 were very small. Different patterns in the 
composition of variance at the school and district levels were observed for 
grade 3 and grade 5 students. For grade 3, the variance is predominantly at 
the school level, while for grade 5, the variance is nearly equal between the 
school and district levels. Results of the analysis also indicate that, in 
addition to the racial, gender, and socioeconomic status gaps commonly found 
in studies of mathematics performance, students who changed their schools, on 
average, had lower performance than those who stayed in the same school 
between grade 3 and grade 5. Results indicate the differential effect of 
Delaware schools on their students' performance in mathematics on 
standardized tests, but the reasons for the differences are not known. 
(Contains 7 tables and 17 references.) (SLD) 
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Modeling School and District Effects in the Math Achievement of 
Delaware Students Measured by DSTP: An Application of 
Hierarchical Linear Modeling in Accountability Study 

Background 

The term “school effects” is found throughout the educational literature. 
Numerous studies have been done to identify effective schools and factors associated 
with the success or failure of school s. It has been pointed out (Raudenbush & Willms, 
1995) that “school effect” usually refers to the extent to which attending a particular 
school modifies a student's outcome. This conception underlies current policy initiatives 
that aim to hold individual schools accountable for their contributions to student learning. 
A similar conception can be applied to the educational policy that intends to hold 
accountable the units higher than schools, for example, school districts. The publication 
of the ranking of both schools and districts in Delaware based on their student 
performance in the Delaware Student Testing Program (DSTP) is one example of such an 
effort. 



In 1996, Delaware initiated a statewide assessment of student learning in 
mathematics, reading, writing, science, and social studies, as a response to the standards 
movement that began in 1991 and the adoption of Content Standards for all major school 
subjects in 1995 (Woodruff, 2000). In 1997, the Delaware General Assembly passed 
legislation that made the DSTP the official measure of progress towards the Delaware 
content standards and the major measurement tool for the state’s new accountability 
system for students, schools, and districts. The law also established a system of school 
accountability based on student performance on the DSTP that holds all schools 
accountable. Each year since 1998, students in grades 3, 5, 8, and 10 are tested in 
mathematics, reading, and writing. Science and social studies were added in the spring of 
2000 for students in grades 8 and 1 1 , and in the fall of 2000 for students in grades 4 and 
6 . 



The capability of decomposing and modeling the variance in student achievement 
scores on the different levels of our school system is a vital condition for a sensible 
system of accountability. Our study used the DSTP math scores as an example and tried 
to answer the following questions: How much variability in the student math performance 
is among schools and among school districts? How much of the observed variance 
among schools and districts can be attributed to the individual student background 
characteristics? And how much of the variance can be attributed to other factors on the 
school and district levels? 

Statistical Analysis of School Effects 

Educational researchers are interested in comparing schools or school districts in 
terms of the achievement of their students. The popularity of public accountability 
system in education calls for fair and scientifically valid approaches to estimate the 
school and district effects. It is commonly held among researchers that characteristics of 
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students may undermine the fairness of judging schools or districts on the same basis 
(Yen, Schafer, & Rahman, 1999). Student background information, such as prior 
achievement, ethnicity, and socioecnomic status (SES) should be adjusted, and such 
adjustments should be made at both student and school levels (Caldas & Bankston, 
1997). 



There have been numerous explorations on the appropriate methodologies to 
adjust the impact of student demographic variables when estimating school or district 
effects. One approach is to regress school mean achievement scores on school means of 
one or more background variables (for example, the average SES of the students in a 
school). Then the school effect is each school’s residual from the regression. This 
approach may be adequate to the extent that there is minimum within-school variance (in 
other words, schools impact all their students in similar ways). In most cases, however, 
such an aggregation results in loss of information and biased estimations. 

An approach more sophisticated than simple aggregation to higher units of 
analysis involves the aggregation of residuals from the student-level regression models 
and the use of average deviation to indicate school effects. This approach takes account 
of individual characteristics and has been adopted in some recent studies (for example, 
Felter & Carlson, 1985; Saka, 1984, Webster & Olson, 1988). The concern is the 
variation at school levels that actually exists. For example, the average student 
characteristics of a school may have an effect on student achievement above and beyond 
the effect of the individual student’s characteristics. Such an effect reflects the school’s 
resources and environment, which provides a common experience for its students. 
Misestimated standard errors may occur when such dependence among the students 
within the same school is not modeled (Bryk & Raudenbush, 1992). 

According to Goldstein (1995, 1987), an analysis that explicitly models the 
structure in which students are grouped within schools has several advantages. First, it 
can produce statistically efficient estimates of regression coefficients. Second, by 
incorporating the clustering information, it can provide standard errors and significant 
tests generally more conservative than those by traditional regression analysis. Third, it 
makes it possible to explore the extent to which differences in average achievement 
difference between schools can be accounted for by organizational factors as wells as 
other characteristics of students, and the extent to which school differ for different kinds 
of students. Finally, it enables the relative ranking of individual schools for public 
accountability system after adjusting for students’ intake achievement on top of other 
student level and school level characteristics. 

Among the various models proposed for multilevel analysis, Hierarchical Linear 
Modeling (HLM), systematically introduced by Bryk and Raudenbush (Raudenbush & 
Bryk, 1986; Bryk & Raudenbush, 1992, 1988; Raudenbush & Willms, 1995), gains most 
popularity among educational researchers (Kreft & De Leeuw, 1998). HLM provides 
estimates of linear equations that explain outcomes for group members as a function of 
the characteristics of the group as well as the characteristics of the members. It is 
relatively easy to implement and interpret, and thus it has been called the “model of 
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choice” and widely used to estimate school effects (Young, Reynolds, & Walberg, 1996; 
Yen, Schafer, & Rahman, 1999; Webster, Mendro, Orsak, & Weerasinghe, 1998; Morris, 
1995) and school district effects (Hargrove & Mao, 1997). 

Method 

Our study modeled the school and district effects in the math scores of the 
Delaware Student Testing Program using Hierarchical Linear Modeling. Three-level 
hierarchical models were fitted to estimate school and district effects in the DSTP math 
scores, and to examine the school and district variance with variables assessing student 
characteristics entered as predictors at each level. 

Data 



Up to this date, four waves of DSTP data have been collected (1998, 1999, 2000, 
2001), which include both students’ test scores and key demographic information about 
the students. The data of the first three years are currently available for analyses. The 
release of the 2000 data provides the first chance to study a specific cohort, namely, the 
3 rd graders tested in 1998 and then tested again in 2000 as 5 th graders. 

The study used three data sets. The first set (grade 3 data) included the math 
scores of the grade 3 students who took the DSTP in 1998. There were 8061 students in 
66 schools and 15 districts. The second data set (grade 5 data) included the math scores 
of the grade 5 students who took the DSTP in 2000. There were 8066 students in 43 
schools and 15 districts. The third data set (called “matched data” in the rest of this 
paper) included pairs of scores of the same individuals who took the 3 rd test in 1998 and 
the 5 th grade test in 2000. In this data set, there were 6872 students nested in 43 schools 
and 15 districts. 

Model Specifications and estimation procedures 

Three-level linear regression models were applied to the three data sets described 
above. The dependent variable was the math standard based scores (SBS), reported on a 
scale that runs approximately from 150 to 800. 

Variables of student background information were specified as independent 
variables at each level. These variables are listed in Table 1 . At the student level, 
dummy variables were created to indicate race, gender, and special education status and 
whether the student was assigned to the Title I reading program. The Title I Reading was 
chosen as a proxy for the student’s SES status because the data available do not contain 
direct assessment of the socioeconomic status of individual students. For the matched 
data set only, three more variables were added. Grade 3 math scores in 1998 were used 
as an estimate of the student’s previous math achievement. It was also known from 
preliminary analyses that 72% of the students in the matched data set changed their 
schools and 13% changed districts before they moved up to grade 5. Two more dummy 
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variables were thus created to indicate whether a student had made changes in his/her 
school or district registrations between grade 3 and grade 5. 

The variables specified at the school level include the size of school in terms of the 
number of students, and the proportions of students in a school who were white, female, 
or identified as special education students. The school’s average SES status was assessed 
by the percentage of students in that school who were eligible for free or reduced price 
lunch (information obtained from sources other than the DSTP data files). For the 
matched data alone, there was one more variable that indicated the proportion of students 
who had transferred into the school from other schools. 

At the district level, the variables included the district size (the number of 
students), the proportions of students in a district who were white, who were identified as 
special education students, and who were eligible for free or reduced price lunch. 

Another variable was the proportion of the students in the district who were enrolled via 
school choice. For the matched data set, there was also one more variable indicating the 
proportion of students who had transferred from another district. 

The models were estimated with hierarchical linear modeling (HLM 5). For each 
data set, the estimations started with an unconditional model (Model 0) to assess the 
initial proportion of the variance at each level. In the following equations, i stands for 
individual student, j for school, and k for district. Model 0 did not include any predictors 
at any level, and the equations were: 



Level 1 : Math//* = itojk + eyk 
Level 2: %* = Pook + r 0J k 
Level 3: fi 0 ok = Yooo + Mook 

Where 

Math//* is the math score for student i in school j and district k, 

Ttojk is the expected math score for school j in district k, 
etjk is the residual for student i in school j and district k, 
fiook is the expected math score for district k, 
rojk is the residual for school j in district k, 

Yooo is the grand mean or the average of all students, 
and juook is residual for district k. 

The next step of estimation (Model 1) added the student level variables to the 
level 1 equation: 



Level 1 : Mathp = %* + n ljk (WHITE) p + ... + e ijk 



where WHITE is a dummy variable specifying whether a student is minority. 
Other student level covariates are not shown in the equation above. 

All regression coefficients other than the intercepts were constrained to be 
constant within schools and districts so the models on level 2 and level 3 were: 
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Level 2: 7i 0 jk = Pook + r 0 jk 

Kljk = Pi Ok 

Level 3: Poo k = Yooo + Moo k 
PlOk = Yioo 



At tile third step (Model 2) the level 1 intercept ( ^ojk, mean math achievement of 
school j in district k) was regressed on the school level variables and the level 1 equation 
remained the same. 



Level 2 : nojk = Pook + Poik (SLUNCH)# + . . . + r 0 jk 
Kljk = Pi Ok 



Here S_LUNCH is the percentage of students eligible for free or reduced price 
lunch in a school. Again, other covariates on the school levels are not listed in the 
equation of Ttojt 



Level 3: Pook= Yooo + Mook 
Poik~ Yoio 



At the fourth step (Model 3 or the full model), the level-two intercept (pook, mean 
math achievement of district k) was regressed on the district level variables, with the 
level 1 and level 2 equations unchanged. 

Level 3: Pook = Yooo + yoo/(D_CHOICE)* + . . . + juook 
Poik = Yoio 



where DCHOICE is the percentage of school choice students in a district. The 
other covariates on the district level are omitted. 

Indicators of school and district effects 

The above-specified model falls into the category of multilevel models with 
random intercepts and fixed slopes (Bryk & Raudenbush, 1992). For the purpose of 
estimating school and district effects, the following three equations are of most interest: 

Level 1 : Math,# = nojk + (WHITE) #+...+ e# 

Level 2: = Pook + Poik (S_LUNCH)# + ... + r 0jk 

Level 3: y Book = Yooo + yoo/(D_CHOICE)* + . . . + 

In this study, school and district effects are defined as the unique effect for each 
school ( rojk ) or district (poop after controlling for the impact of the covariates on the three 
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levels. In other words, they are the deviance of the school or district average 
performance from their expected performance. 

Results 

Table 2 displayed the estimated random effects of the unconditional model 
(Model 0) for all three data sets. The results show that, on the whole, the proportions of 
the among-school variance and the among-district variance in the total variance were 
small. In fact the variance among districts at grade 3 is trivial (0.8% of the total 
variance). The variances among school and among districts are more balanced in the 
grade 5 data and in the matched data. For the grade 5 data (not matched), the two 
proportions out of the total variance are identical (3.4%). 

The estimated fixed and random effects of the series of estimations (Model 0 
through Model 3) for the three data sets are displayed in Tables 3, 4, and 5. 

For grade 3 and grade 5 data sets, the inclusion of individual student characteristic 
variables at the student level (see Model 1) reduced the within-school variance of math 
scores by 24.6% and by 19.2% respetively. The between-school variances were also 
reduced by 32.1% at grade 3 and by 32.9% at grade 5. For the grade 5 data, the student 
level variables helped to reduce the district level variance by 1 1 .9%. For the grade 3 
data, however, they actually increased the district level variance. 

For the matched data the student level variables, including the grade 3 math 
scores and the two variables about school and district changes, decreased the student 
level variance by over 65% and helped to reduce the variance on the school and the 
district levels by 28.9% and 67.3% respectively. 

The school level predictors, when added to the model (see Model 2), did not 
further reduce the school level variance in the grade 3 data but they accounted for over 
73% of variance at the district level that had been increased at the previous step of 
modeling. For the grade 5 data, the school level predictors reduced the among-school 
variance by another 42.5% but they slightly increased the variance at the district level. 

For the matched data set, the school level variables further reduced the among-school 
variance by 34.6% and also increase the among-district variance by 14.3%. 

The final models with district level variables added in (Model 3) reduced the 
among-district variance to non-significance in all three data sets. These variables also 
slightly increased the among-school variance in both the grade 5 data set and the matched 
data set but not in the grade 3 data set. 

The estimated fixed effects on each level are also showed in Tables 2, 3, and 4. 

On the student level the results showed strong impacts of individual race, gender, special 
education status, and SES (approximated by the Title I Reading status) on math 
achievement in favor of white, male, non special education students, as well as students 
with higher SES. These effects were consistent across the datasets and significant even 
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after previous math achievement (grade 3 scores) had been accounted for in the matched 
data models. Previous math achievement, in its own right, was the most significant 
predictor of the outcome math scores (J3= 0.728, t = 83.025,/? < 0.001). At the same 
time, changing school and changing district both had negative impact on math 
achievement. 

The effects of the school level predictors varied across the three datasets. For 
grade 3, only the school size and the percentage of students eligible for ffee/reduced price 
lunch had significant effects, both negative, on the DSTP math achievement. For both 
grade 5 and the matched data, the only significant variable on the school level was the 
percentage of female students that was negatively related with math achievement. 

At the district level, district size had significant negative effect on the math 
achievement of grade 3 students. For the grade 5 data, both district size and the 
percentage of students eligible for free or reduced price lunch students had significant 
negative effects. However, the percentage of special education students had highly 
significant positive effect (J3= 3.103, t = 5.771,/? < 0.001) in the grade 5 data. The 
matched data also showed that this variable had a positive impact after all other variables 
were controlled for. Neither district size nor the percentage of school choice students 
was found to have significant effect on math achievement for the matched data. 

A revised, more compact model was fitted to the matched grade 5 data. The 
results were shown in Table 6. According to this model, after their previous math scores 
were controlled for, students would have lower performance if they were minority, 
female, lower in SES, identified as special education students, or if they changed schools 
from grade 3 to grade 5. The same students would perform worse if they were from a 
school with higher proportion of female students or of students entitled to free or reduced 
price lunch. Counter-intuitively, they would actually perform better if their district had 
higher percentage of special education students or of students coming from another 
district. This last model had 15 estimated parameters versus the 24 parameters in the full 
model, and fitted the data equally well as the insignificant difference in the deviances 
between the two models indicated (A/ 2 =8.709, df= 9 ,/? < 0.25). 

The school and district effects were estimated by calculating the residual terms on 
the school and the district levels for the revised model of the matched data. Table 7 lists 
the residuals at both school and district levels. As indicators of school and district effects 
of the grade 5 math scores, these residuals are ranked and the rankings (called Modeled 
Rank in the table) are displayed together with the rankings based on the average school 
and district math scores (called Raw Score Rank in the table). For some schools and 
districts, the discrepancies in the two rankings are huge (School #11, School #38, and 
District #14, for example). For others, the raw and modeled rankings are not far from 
each other. The correlation between the two rankings of schools is 0.7412, and the 
correlation between the two rankings of districts is 0.7393. 
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Discussion 

The Delaware Student Testing Program (DSTP) is designed exclusively as an 
assessment of the academic achievement of Delaware’s public school students, rather 
than a research project intended to account for all the variation in student performance. 
The current study was based on available information and it examined only the math 
scores of DSTP. Nevertheless, this study is the first to decompose the variance of DSTP 
performance at different levels - student, school, and district, and provides a potential 
alternative to the current accountability system in Delaware. 

The results of the current study indicate the following: 

First, hierarchical linear modeling showed the proportions of the variance at the 
school level and the district level in the total variance of the DSTP math scores (1998 and 
2000) were very small (see Table 2). According to Bryk and Raudenbush (1992), the 
variability among schools was normally 10 to 30 percent in school effect studies. At this 
moment it is not clear what grade levels and what subject fields were modeled in those 
studies and to what extent the current findings were deviant from the norm. 

Second, different patterns in the composition of variance at the school and district 
levels were observed for grade 3 and grade 5 students. For grade 3 the variance is 
predominantly at the school level, while for grade 5 the variance is nearly equal between 
the school level and the district level. The larger number of grade 3 schools (66 schools), 
and subsequently smaller number of students within each school, might have contributed 
to the larger among-school variation in the grade 3 math scores. 

Third, besides the racial, gender, and SES gaps commonly found in studies of 
math performance, the results from the current analysis indicated that on average students 
who had changed their schools had lower performance than those who stayed in the same 
school between grade 3 and grade 5, after the impacts of other student background 
characteristics including their previous math achievement were controlled for. The study 
needs to be replicated for other subject areas and grade levels before any generalizations 
can be made. In Delaware, such an observation may indicate a problem caused by a lack 
of consistency in school experience when students have to leave a certain school to be in 
a higher grade. 

Finally, while district level variability in math achievement could be easily 
explained away by one or two variables of student characteristics aggregated to that level, 
the variance at the school level remained significant even in the full model. In other 
words, the models we have specified are not exhaustive to account for the variance at the 
school level. From another perspective, it indicates the existence of differential effect of 
Delaware schools on their students’ math performance in standardized testing, after the 
impacts of student background characteristics, including their previous achievement, 
were controlled for. The reasons for such a difference are not known from the current 
study. 
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In further analyses, we expect to expand and modify the study in the following 
ways. First, hierarchical linear modeling of DSTP scores will be extended to other 
subject areas, such as reading, writing, science, and social studies to explore the 
similarity and differences in school and district effects across these subjects. Public 
accountability systems usually ask for a composite value, rather than scores of a single 
subject, to stand for the relative efficiency of a school or a district. We need to explore 
the ways hierarchical linear models can be helpful in this aspect. 

Second, hierarchical linear modeling will be applied to the DSTP scores of all the 
grades in every year since 1998. The aim is to reveal any general pattern or change in the 
school and district effects in the first four years of the DSTP program. 

Third, more variables on the school level will be introduced to account for the 
part of variance that the student background characteristics alone cannot explain. It will 
help to answer an important question of what makes a school effective or ineffective after 
we take their students’ characteristics into consideration. 
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